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Abstract 

Problem partitioning of regular computation over two-dimensional meshes on mul- 
tiprocessor systems is examined. The regular computation model considered involves 
repetitive evaluation of values at each mesh point with local communication. The 
computational workload and the communication pattern are the same at each mesh 
point. The regular computation model arises in numerical solutions of partial differen- 
tial equations and simulations of cellular automata. Given a communication pattern, 
a systematic way to generate a family of partitions is presented. The influence of var- 
ious partitioning schemes on performance is compared on the basis of computation to 
communication ratio. 
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1 Introduction 


Applying parallel processing in solving computational intensive problems has been of 
much interest in recent years. There are many scientific and engineering problems in 
which the major computation structure is regular. This kind of regularity is a great 
advantage contributing to the good performance of many parallel implementation. 

The regular computation model considered involves repetitive evaluation of values 
at each mesh point with local communication. The computational workload and the 
communication pattern are the same at each mesh point. This class of computations 
naturally arises in numerical solutions of partial differential equations and simulations 
of cellular automata. 

The numerical solution of partial differential equations (PDE), by methods such 
as point Jacobi iteration, involves evaluation of the value at each mesh point at each 
iteration as the weighted sum of the previous values of its neighbors. The pattern of 
communicating neighbors is called the stencil. For example, if only the values of the 
north, south, east and west neighbors of a point is needed, the stencil used is called a 
5-point stencil. Interestingly but maybe not suprisingly, a new non-PDE approach for 
solving physical problems also shares the characteristic of regular computation. Recent 
research in physics has shown that lattice gas cellular automata. [1] have the great 
potential of simulating fluid flow phenomena. A cellular automaton consists of cells 
possessing discrete values. At each cycle, the value of a cell is evaluated as a function 
of the values of itself and its neighbors. 

When this kind of regular computation is implemented on a multiprocessor system, 
it is generally preferable to divide the data space (mesh points) into partitions, and 
assign each partition to a different processor such that only the values of the boundary 
points of a partition have to be accessed by other processors [4,2]. Since performance 
is affected by both the computation and communication costs, the shape of partitions 
can have important effect on performance. 

Historically, rectangular or square partitions have most commonly been assigned 
to processors, primarily because the resulting data structures can be easily indexed as 
two-dimensional arrays. Vrsalovic, et al. [4] considered the solution of Poisson’s equa- 
tion over a square region using a 5-point discretization stencil. They tested triangular, 
square, and hexagonal partitions. Reed, Adams and Patrick [3] conducted an analyt- 
ical study on selecting optimal stencil/partition pairs. They considered rectangular, 
triangular, square and hexagonal partitions. If computation to communication ratio 
is used as the criterion for comparison, they found that square partitions are best for 
9-point star stencils, hexagonal 1 partitions are best for 5-point stencils, 9-point cross 
stencils and 13-point stencils, and square and hexagonal partitions are equally good for 


l As explained in Section 2, this kind of hexagons will be referred as R-hex. 



*» 



2 

GENERATION OF PARTITIONS 
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13-point stencil 


Figure 1: Five commonly used stencils 


7-point stencils. 

In this report, we will study problem partitioning of regular computation over two- 
dimensional meshes. We will show that for the various stencils (communication pat- 
terns) considered, there are other shapes of partitions which achieve higher computation 
to communication ratios than those previously discussed in the literature. In section 2, 
a systematic way to generate families of partitions using the concept of stencil neigh- 
borhood is presented. Section 3 discusses the properties of the partitions. Finally, in 
section 4 the computation to communication ratio is used as a metric to compare the 
performance of different partitioning schemes under various choice of stencils. 

2 Generation of Partitions 

In solving problems belonging to the class of regular computation, there is usually a 
choice of stencils. Figure 1 shows several commonly used stencils for two-dimensional 
meshes. The stencils to be considered in this report are the 5-point stencil, the 7-point 
stencil, the 9-point star stencil, the 9-point cross stencil and the 13-point stencil. 

Let us define a •partition to be a set of points in the two dimensional space Z 2 , where 
Z is the set of integers. The neighborhood, N(p), of a point, p, is a set which contains 
the point itself and some points positioned relative to the point, where N is called 
the neighborhood function. With this notation, we may denote the corresponding 
neighorhood function of the stencils considered above as iV 5 , N 7 , Ng s , N 9c , and JV 13 
respectively. For example, we can express iV 5 and N 7 as follows: 

JVsCp) = {p,p + (l,0),p + (0,l),p + (— l,0),p + (0,-l)} 

Nt(p) = {p,P + (l,0),p + (0,l),p + (— 1,0), p + (0,-1), p+ (1,1), p + {- 1,1)} 
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Similary, the neighborhood functions for other stencils can be written down easily. 

The extension of a partition P under the neighborhood function N is defined to be 

E(P-,N) = {q:qeN(p),peP} (1) 

In other words, the extension of a partition is a new partition which contains exactly 
all the neighboring points of the points in the original partition. 

Given any seed (initial) partition 5, and neighborhood function N , we can recur- 
sively define a family of partitions as follows: 

A = 5 (9) 

P k = P(P*_i;JV) iffc>l ( ; 

If we denote E(E(P\ IV); N) as P 2 (P; N), E(E(E(P- N); IV); N) as E 3 {P;N) and 
so on, we can rewrite the above partition generation scheme as follows: 

Pk = E kl ~ 1 (S-, N) (3) 

where E°(S ; N) = S. 

Since it is actually the geometric properties of a partition which are important here, 
we consider two partitions. P x and P 2 equivalent if P 2 is a translation of P x , that is, if 
there exists a translation vector u = (u x , u y ) such that 

P -2 = T(Pi',u ) = {g : q=p + u,p 6 Pi} ' 

where T is the translation function. It is easy to see that the relation defined above 
is reflexive, symmetric and transitive, hence it is indeed an equivalence relation. This 
equivalence allows us to freely talk about the shape of the partitions without taking 
much care about the origin of the coordinate system. For our purposes, rotation equiv- 
alence and reflection equivalence are not considered here. 

One type of seed we will use very often is a rectangle of size m x n, denoted as 
‘S’m.nj where 5 m>n = {(x,y) : 1 < x < m, 1 < y < n}. An important special case is the 
single-point seed, Si iX . Suppose the seed is a single point, what kind of partitions will be 
generated if N is one of the corresponding neighborhood functions of the stencils shown 
in figure 1? Figure 2 shows the cases for iV 5 , N 7 and N 9s . We shall call these kinds of 
partitions diamonds , hexagons and squares respectively. It should be noted that this 
hexagon is different from the one as discussed in Reed’s paper [3]. Reed’s hexagon, 
denoted as R-hex here, is actually some kind of diamond partition with variable seed 
size, according to our classification. Suppose we choose N 5 as the neighboring function. 
If we set S to S 2 , 2 for generating P 2 , 5 to S 3<2 for generating P 2 , S to S 4t2 for generating 
P 3 , and so on, we will get R-hex (see figure 2). 


f 
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3 PROPERTIES OF PARTITIONS 
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Figure 2: Four kinds of partitions 

3 . Properties of Partitions 

A partition P is said to tessellate Z 2 if and only if for any finite region R C Z 2 , there 
exists a finite number n of translation vectors u^s such that 

1. R C U-U T(P; Ui) and 

2. T(P]Ui')f)T(P',Uj) = 0 for all i,j. 

In other words, a partition tessellates if some copies of it cover any given region 
without overlapping each other. In a given problem, if we use only one kind of partition 
which tessellates the 2-D plane Z 2 , we may reduce the programming effort, because 
every processor will then see the same data structure and communication patterns 
(except possibly at boundaries). 

In general, only some of the partitions of the form E k (S; N ) tessellate the 2-D plane. 
The diamond, the hexagon, the square, and the R-hex are some examples. However, 
the family of partitions derived from the 9-point cross stencil, E k (Siy, Ng c ), does not 
tessellate, whereas those derived from the 13-point stencil also have the diamond shape. 
We will only consider those partitions which tessellate. . 
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The gird of a partition under a neighborhood function N is defined to be 

G(P; N) = E(P; N) - P (4) 

The gird points are exactly those external points which have to be accessed by a pro- 
cessor to which the partition is assigned. Figure 3 shows the girds of various partitions 
under different neighborhood functions (stencil structures). 

It is very important to note that the neighborhood function N in equations 3 and 4 
can be different. For example, if we start with a single-point seed, and choose N to 
be N 9s in equation 3, and N to be N 5 in equation 4, then the number of gird points 
is equal to \G(E k ~ 1 (Si i i] iV 9s ); N 5 )\. However, interesting results do occur when the 
neighborhood function N in equations 3 and 4 are the same. 

Since the neighborhood of a point includes itself by definition, it is obvious that 
P C E{P ; N). Combining this fact with equation 4, we have 

E(P\ N) = P U G(P; N) (5) 

Since P fl G(P; N ) = 0 by definition of G ( see equation 4), we also have 

|£(P;iV)| = |P|+|G(P;iV)| (6) 

Suppose the family of partitions P* is parametrized by S and N, then by applying 
equation 6 to the definition of P* (equation 2), we have 

\Pi\ = 1*1 m 

mi = iPfc_ii + iG(Pfe_ i; jv)| if * > i ( J 

Solving the recurrence equations, we get the formula for finding the size of a partition: 

ini = |5| + ElG i | (s) 

t'=i 

where G, = G(P t ; N ) . 

Equation 8 expresses the size of a partition in terms of its successive layers of girds. 
However, the size of a gird has to be found on a case by case basis. Tables 1 and 2 
give the formula for 1(7*1 and jP*j when the seed is a single point (5i,i), and a rectangle 
(•5m, n) respectively. They can be readily derived by using mathematical induction. 

It is interesting to note that E k ~ l {S\y, Ng„ ) only generates square with sides of odd 
length, with |P*| = 1 + 4k(k — 1) = (2k — l) 2 , and E k ~ l (S 2 , 2 \ Ng s ) only generates square 
with sides of even length, with |P*| = 4'+ (k — 1)(4 k + 4) = (2A:) 2 . 

As a special case of diamond partitions with variable seed sizes, R-hex is generated 
as P*, where P* = P fc-1 (5* +li2 ; N 5 ). Substituting m = k + 1 and n = 2 into the formula 
for iV 5 in table 2, we get |G*| = 6k + 2, and |P*| = 4 k 2 . 



6 


3 PROPERTIES OF PARTITIONS 
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Figure 3: Girds of partitions with different stencils. The black circles are partition 
points, and the white circles are gird points. 
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Table 1: Size of partitions and girds with single-point seed ( 5*1,1 ) 


name of partition 

N 

\Gk\ 

\Pk\ 

diamond 

N 5 

4 k 

l + 2fc(Jb-l) 

hexagon 

n 7 

6 k 

1 + 3k(k — 1) 

square 

n 93 

8k 

1 + 4 k(k - 1) 


Table 2: Size of partitions and girds with rectangular seed ( S m , n ) 


N 

\G k \ 

m 

Ns 

n 7 

n 9s 

4 k + 2m + 2n — 4 
6k + 2m + 2n — 4 
8k + 2m + 2n — 4 

mn + (k — l)(2fc + 2 m + 2n — 4) 
mn + (k — l)(3fc + 2m + 2n — 4) 
mn + {k — l)(4fc + 2m + 2n — 4) 


4 Comparison of Partitions 

For a given partition P with N c as the stencil used in the communication, we assume 
that the amount of computation workload is equal to the size of the partition, |P|, 
and the amount of communication is equal to the size of the gird, \G(P-,N C )\. This 
assumption was also used in [3]. The computation to communication ratio is thus 
defined to be 

CCR=\P\/\G(P-,N c )\ (9) 

For example, if we use the 7-point stencil communication structure, but choose to divide 
the domain into diamond partitions P*, then the amount of computation is equal to 
1 + 2k(k — 1), and the amount of communication is equal to |G(Ft;lV 7 )| = 6k (see 
figure 3). Table 3 shows the amount of communication for the different combination of 
stencils and partitions. 

Since the partitions have different shapes , it is not always possible to divide a given 
domain into idential subdomains such that each subdomain matches the right shape and 
size of a partition one would like to use. We may have to use a bigger partition of the 
same shape, but this may increase the amount of computation and change the pattern 
of communication. However, we can still compare the computation to communication 
ratio (CCR) of the various partitioning schemes in the asymptotic sense (see table 4). 
It is easy to see that for a given seed, the asymptotic CCR is independent of the seed 
itself. In table 4, A denotes the number of points contained in a partition. Note that 
different partitions may have different sets of possible A values. 
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4 COMPARISON OF PARTITIONS 


Table 3: Amount of communication — \G(Pk‘, iV c )| 


stencil, N c 

diamond 

hexagon 

square 

diamond[5 m) „] 

R-hex 

n 5 

4 k 

6k -2 

8k -4 

4A: + 2m + 2n — 4 

6k + 2 

n 7 

6k 

6k 

8k -2 

6k -f 2m 4- 2n — 4 

8k + 2 

N 9s 

8k 

8k 

8k 

8k + 2m + 2n — 4 

10k + 2 

n 9c 

8k + 4 

12k -4 

16 k - 8 

8k + 4m + 4n — 4 

12k + 8 

n 13 

8Ar + 4 

12 k 

16k -4 

8k + 4m + 4n — 4 

12fc + 8 


To calculate the (asymptotic) CCR , we let the area of a partition P*. be a constant 
A, and solve for k. For example, if we use the diamond partition P*. and the 7-point 
stencil, then we have 

\P k \ = 1 + 2k(k - 1 ) = A 

Solving for k, 

k = (y/2 A - 1 + l)/2 

Hence, 


CCR = |P fc |/|C?(P fc ;iV 7 )| 

= A/6k 

= A/(3(V2A —1 + 1)) • 

x \Ja/\s 


Similarly, we can derive the values in table 4 from tables 1, 2 and 3 according to 
equation 9. 

From tables 4 we have the following observation: 

1. In all the cases considered, CCR is proportional to \f~A. This is not surprising, 
because the size of a partition is a quadratic function of k , while the size of the 
corresponding gird is a linear function of k. 


2. For each partition, CCR decreases or stays the same as |jV c | increases. It is 
because for the same area A, the number of gird points increases or stays the 
same as there are more points contained in the communication stencil. 


3. Diamond partitions yield, the highest CCR (yjA/8) for iV 5 , hexagons are best 
(^A/12) for JV 7 , squares are best A/ 16) for jV 9s , and diamond partitions are 
also best {yj A/ 32) for both N 9c and N 13 stencils. This pattern suggests that 




Table 4: Asymptotic computation to communication ratio ( CCR ) 


stencil, N c 

diamond 

hexagon 

square 

R-hex 

n 5 

A 8 

A 12 

A 16 

V^/9 

n 7 

JA/18 

A 12 

A« 

A^ 

n 93 

w 

to 

J3A/64 

\[AIV> 

A 25 

n 9c 

A 32 

00 

•JA/ 64 

A36 

n 13 

l/A/32 

VW 48 

A^ 

A^ 


Table 5: Normalized asymptotic CCR 


stencil, N c 

diamond 

hexagon 

square 

R-hex 

N 5 

1.41 

1.15 

1 

1.33 

| at 7 

0.94 

1.15 

1 

1 

n 9s 

0.71 

0.87 

1 

0.8 

n 9c 

1.41 

1.15 

1 

1.33 

iv 13 

1.41 

1.15 

1 

1.33 
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4 COMPARISON OF PARTITIONS 


Table 6: Number of neighboring partitions (for Pk when k> 3) 


stencil, N c 

diamond 

hexagon 

square 

R-hex 

n 5 

4 

6 

4 

6 

n 7 

6 

6 

6 

6 

n 9s 

8 

6 

8 

6 

! n 9c 

8 

6 

4 

6 

n 13 

8 

6 

8 

6 


there is a formal relationship between the optimal partition and the chosen sten- 
cil: The partition derived from a stencil N c is the optimal partition in terms of 

. -0 

computation to communication ratio when N c is also the communication stencil, 
for N c = N 5 ,N 7 ,N 9a . 

4. The results on selecting the optimal stencil/partition pairs reported in [3] (see 
section 1) correspond to the last two columns of table 4. 

5. R-hex is the second best whenever diamond is the best (when N c — N 5 , N 9c , jV 13 ). 
It is the second worst whenever diamond is the worst (when N c = N 7 , N 9s ). It is 
never the optimal partition in any of the cases considered. 

Since square partitions probably result in most regular data structures, we are espe- 
cially interested in knowing how well square partitions compare with other partitions. 
Hence, the normalized asymptotic CCR with respect to square partitions are calculated 
and displayed in table 5. It shows that square parition is never more than 41% worse 
than any other partitions under all the cases considered. 

For our purpose of finding the optimal partitions under different cases, rectangular 
stripes, rectangular partitions and triangular partitions are not considered. They have 
been previously shown to be inferior to squares or R- hex’s [3]. 

Good peformance involves many factors. Communication cost not only depends 
on the total amount of communication, but also depends on the actual patterns of 
communication, such as the number of communicating neighbors (see table 6) and the 
underlying machine architectures. This report intends to give the asymptotic bound on 
one of the issue — optimal partitioning with respect to the computaion to communica- 
tion ratio. Maximizing the computation to communication ratio does not necessarily 
guarantee minimum execution time of a parallel program, but it is still an important 
indicator of the potential performance of the program. It is interesting to see how much 
this ratio varies under different combination of stencils and partitions. 



5 Conclusion 


This report has presented an analysis for selecting optimal partitions for regular com- 
putation over two-dimensional meshes given the communication stencil. The criterion 
used is the computation to communication ratio, which is defined to be the ratio of the 
size of a partition to that of its gird. It is shown that diamond partitions are best for 
5-point stencils, 9-point cross stencils and 13-point stencils, hexagonal partitions are 
best for 7-point stencils, and square partitions are best for 9-point star stencils. 
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